Skip to main content
Version: 4.0
info

Variables used in operators need to be configured in advance during task development.

Data Input Node

Data Input

  • supOS instance
    • Select template, instance, attribute from the supOS you added.
    • Enable History to add attribute history information such as alias, value and time as output fields.

  • File data source
    • HDFS is not supported.
    • Select File match under File selection method, and you can use wildcard * with suffix to match the file name. For example, test*.xlsx.
    • Select File upload under File selection method, and you need to save the file in UTF-8 encoding if the file contains Chinese characters.
info

When the input data source is a file, the published task reads the file and output results again if the file changes.

  • API data source
ItemDescription
Data source type
  • WebService: Automatically match output fields. Variables can be configured in XML text with the '#' symbol as an identifier, and the variable must exist in the task global variables.
  • RESTFul: Automatically match output fields. Variables can be configured in the request URL, request headers, and JSON text, and the variable must exist in the task global variables.
Request address

Enter the API request address and use #{} to identify variables.

Request header paramSet the header parameter of the API. When setting Variable as the Processing method, you can select the value from global variables.

Message Queue Input

Collects data from message queue and maps it to a 2-demension table, and delivers to downstream operators for processing.

ItemDescription
Data source typeOnly Kafka and RocketMQ are available.
TopicSelect the corresponding topic of the data to be added. You can click Data preview to view the data.
Field settingsClick New field to add fields corresponding to output information.
Initial offset
  • earliest: During each scheduling, if there is no new submitted offset, synchronization starts from the beginning. If there is a new submitted offset, consumption starts from the newly submitted offset.
  • latest: When there is no submitted offset, consumption will start from the currently submitted data and will not consume data that was generated in the past.

Data Warehouse Input

Supports tables and views on ODS, DW and local DW message.

info

Install X-DAM in advance. Otherwise, Local data warehouse will not be available.

ItemDescription
Data source typePresently, only Local data warehouse is available with X-DAM installed.
Data warehouse levelSelect the level of the data in data warehouse.
Data sheetSelect the data sheet.

Data Processing Node

Data Quality

QC RuleDescription
Null checkSelect a field, check whether the field has null values, and select from Discard, Fix and Ignore as the subsequent operation.
Range checkSelect a field, set a value range to filter the field and leave values only within the set range.
Data format checkSelect a field, and use the embedded or custom expression to verify its data format.
Enumeration value checkSelect a field, and set certain values to check whether the field value is one of the enumerated values.

SQL Execution

Uses SQL statements to perform simple operations such as insert, delete and update on the relational data sources.

Data Set

Renames data fields or configures field mapping to generate a new dataset, mainly used for redefining the data structure set during data processing.

Data Filtering

Multiple filter conditions are available for a single field, and data that meets all conditions will be filtered.

  • in/not in: For numeric fields. Value is an array of numbers and use comma to separate. in means the field value equals to one of the numbers and not in means equals to none.
  • like/not like: For text type of fields. like matches part or all of the text and not like matches none.
  • between/not between: For date type of fields. between means the date is within the set range and not between means not.

Data Connection

Integrates data from multiple tables into one.

  1. Drag 2 data input operators and configure them with 2 relational data tables.
  2. Drag the Data connection operator and connect both data sources to it.
  3. Double-click Data connection, and then configure its connection relation.
    1. Select data sources, and then click to select the join relation.
    2. Click the data source to be matched.
    3. Click New to add the match field.

Data Sorting

info

The sorting priority goes down along with the field position on the list.

Data Merging

Merges data from multiple data sources. You can select columns from multiple sample sets and merge them into a data set.

  1. Drag 2 data input operators and configure them with 2 relational data tables.
  2. Drag the Data merging operator and connect both data sources to it.
  3. Double-click Data merging, and then configure its merging relation.
    1. Click New to add a merged field.
    2. Enter the merged field name, and then select field type and fields from both data source to be merged.

Missing Value Handling

Replaces null and empty strings in the data source to make sure subsequent data processing such as feature query, modeling, goes smoothly.

  • Replaces the missing value with the maximum, minimum, average value of the column, or a constant value, global variable or linear fitting value.
  • Use custom formula to generate values as replacement. Only +, -, *, / are supported.

Data Type Conversion

Converts the type of input fields.

info

If the conversion is not legal, 0 is generated.

Data Completion

ItemDescription
Data completion time granularityComplete data columns based on the set granularity. For example, set the granularity to Second, the complete the column where seconds are missing.
Completion base time fieldSelect a time field to be completed.
Start dataThe data column where the completion starts.
End dataThe data column where the completion ends.

Variable Settings

Gives new value to global variables you set during task development. The new value can be either manually set or fields from the input data source.

Data Aggregation

Aggregate data based on the set group fields.

  1. Select fields, and then click Add group field. The aggregated data will be grouped by the set fields.
  2. Select fields of numeric type, and then click Add aggregated field to add fields to be aggregated.
  3. Set Aggregation cycle type.
ItemDescription
Aggregation cycle type
  • None: Aggregate all data based on the group fields.
  • Time: Aggregate grouped data by time. For example, set the cycle to 5 minutes, and then every 5 minutes, run the aggregation.
  • Data record: Aggregate grouped data by number of data record. For example, set the cycle to 60 data records, and then every 60 records, run the aggregation.
Aggregate base time field/Sort fieldSelect a base sorting field.
Aggregation cycleSet the time or data record by which the aggregation is calculated.
Record time filling method/Record entry methodSet the output value, which can be either the initial value or end value of each cycle.
Sliding windowOnly available when setting Aggregation cycle type to Time.
Additional output data record fieldOnly available when setting Aggregation cycle type to Data record. Select an additional field for output.

Aggregate by time Aggregate by data record

Table Transposition

info

Only available for file type data, and output to file type of data.

Custom Operator

You can customize algorithm and its functions.

info

Presently, only original JaveScript is available.

  • JavaScript rules:
    • The input parameter of the JS script is a collection of data passed down from the previous node, which can be a single or multiple entries.
    • The definition of the data collection in the script is "dataList".
    • To obtain the value of a property, use the dot notation followed by the property name (e.g., ".id").
    • The JS script should return the corresponding object or data collection after performing the required operation in the form of a JSON array or string.
    • The variables configured on the page can be directly used in the JS script (just ensure that the variable names match).
    • Debugging logs can be added to the JS script using the "log.debug()" function, and the debugging results can be viewed at the bottom of the script.
    • It is possible to use Java-related collection classes for operations within the JS script.
  • Script example
var arr = [];
for (var i=0; i<dataList.length; i++) {
dataList[i].id = dataList[i].id + 10;
arr[i] = dataList[i];
}
arr;
caution
  • When calling a function with an input object, pay attention to null values that may cause a NullPointerException.
  • When performing operations on input fields, be aware of the object types of the fields (comparing time objects may result in different dates appearing equal, so it is recommended to convert them to strings or primitive types before operating on them).
  • When performing calculations on numerical values, the data type may change unexpectedly.
  • In a JS script, custom variables cannot have the same name as global variables configured during task development.
  • After writing the script, click Data simulation to debug the data. You can add up to 50 records.

  • Debug the script to make sure the result is reasonable.

Big Data Model

Model Operation

Runs big data model of supOS and output the results.

Model Update

Select big data models that can be updated, and update them automatically and output the results.

Generate Sample Set

Generates sample sets data and synchronizes to X-BD sample set management.

info
  • No downstream operators.
  • String type of data in sample set cannot contain English comma ,.

Process Control

Based on the set conditional statements, branch tasks that meet the conditions are executed.

  1. Connect multiple operators (up to five branches) after the branch task node, and then double-click the branch task to configure its parameters.
  2. Click Settings corresponding to each subsequent operator, and set their conditions.
    info

    Make sure the result is boolean type of data, and only condition returns True, will the branch task be executed.

Data Desensitization

By using data desensitization to process multiple fields, the desensitized data will overwrite the original data and flow to downstream operators.

Data Output

Data Output

Configurations are similar to the input data source. For details, see Data Input Node.

  • supOS instance

  • File data source

  • API data source

Message Queue Output

Kafka, mQTT and RocketMQ are supported.

Data Warehouse Output

HBASE Output

info
  • HBase data source only receives string type of data.
  • rowKey, as the primary key for an HBase data source, a mapping relationship must be established for it. If there are duplicate values in the data source mapped to it, only the latest row of data will be retained.